Skip to content

fix(js): preserve real exit code when test reset/parent-stream closure races in-flight commands (#170)#171

Merged
konard merged 6 commits into
mainfrom
issue-170-9e2e62bb506a
Jun 11, 2026
Merged

fix(js): preserve real exit code when test reset/parent-stream closure races in-flight commands (#170)#171
konard merged 6 commits into
mainfrom
issue-170-9e2e62bb506a

Conversation

@konard

@konard konard commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Fixes #170 — a CI/CD false positive on Test JavaScript (bun on windows-latest)
(run 27310950658).

The test Shell Settings > Shell Replacement Benefits > should provide better error objects than bash intermittently reported exit code 143 with stderr
"Process killed with SIGTERM" instead of the command's real exit code 5.
The same run also emitted MaxListenersExceededWarning: … 11 close listeners.

These are flaky/quality defects in the library's own source — not in any workflow
file. Full analysis (timeline, requirements, root causes, verification, workflow
comparison) lives in docs/case-studies/issue-170/.

Root causes

The unifying cause is a global teardown preempting a command that user code is
actively awaiting
, replacing its real exit code with a SIGTERM (143). Two
distinct teardown paths reach an in-flight, awaited runner during a
resetGlobalState():

  1. Parent-stream-closure handler (the path that actually broke CI).
    monitorParentStreams()'s 'close' listener calls
    _handleParentStreamClosure() on every active runner, which aborts and kills
    its child. On Windows/Bun the parent WriteStream emits a spurious
    'close' (the same instability behind the MaxListeners warning), preempting
    the awaited command. A reaper-only fix (commit 9fd7ad2) left this path open
    and CI still failed — see the case study §3.3.
  2. Reaper race. cleanupActiveRunners() (run in beforeEach/afterEach)
    force-killed a still-awaited command; the synthetic SIGTERM from killRunner()
    won (finish() is first-wins) and masked the real code.
  3. Listener leak. monitorParentStreams() added a 'close' listener on every
    ProcessRunner construction, but resetGlobalState() cleared the
    parentStreamsMonitored flag without removing the listeners — so they
    accumulated until the MaxListeners warning. This warning is the smoking gun for
    the spurious-close instability behind root cause $fy tool (sh to mjs translator) #1.

The fix (applied across the whole codebase)

A new _awaited flag is set synchronously the moment user code consumes a
runner (then/catch/finally/stream). Both teardown paths honor it:

  • _handleParentStreamClosure() ignores parent-stream closure for awaited
    runners — the await is the authoritative consumer. Fire-and-forget/streamed
    runners (runner.start() without awaiting the runner) are unaffected, so the
    legitimate graceful-shutdown path is preserved.
  • cleanupActiveRunners() skips live, awaited runners while still reaping
    genuinely leaked fire-and-forget ones.
  • Parent-stream 'close' listeners are now tracked and removed in
    resetGlobalState() / resetParentStreamMonitoring(), bounding the count.

How to reproduce

node experiments/repro-issue-170-parentclose.mjs  # Path B (the CI trigger): clean → code=143, fixed → code=5
node experiments/repro-issue-170-awaited.mjs       # Path A (reaper race):    clean → code=143, fixed → code=5

Tests

  • New regression test js/tests/issue-170-cleanup-race.test.mjs encodes all
    three defects. The parent-closure case was verified to go 143 without the
    guard → 5 with it
    .
  • Test JavaScript (bun on windows-latest) — the originally failing leg — now
    passes on this branch (CI run on commit 1c53eef).
  • Lint + Prettier clean on all changed files.

JS/Rust parity

This change is JS-only. The Rust implementation's reset() (rust/src/state.rs)
merely clears the active-runner set; it has no parent-stream-closure monitoring,
no reaper kill, and no synthetic-SIGTERM result — so none of the three defects has
a Rust counterpart. The PR carries the parity-exempt label per the parity gate's
documented escape hatch.

Workflow comparison (issue requirement R2)

Compared .github/workflows/*.yml against the js/rust/python/csharp pipeline
templates. The failing defect lives in command-stream's library source, which has
no counterpart in the templates, so there is no matching issue to file
upstream. Best-practice deltas (Deno matrix leg, fresh-merge simulation) are
documented as follow-up recommendations in the case study.

A changeset (js/.changeset/issue-170-cleanup-race.md, patch) is included to
trigger the next release.

Adding .gitkeep for PR creation (default mode).
This file will be removed when the task is complete.

Issue: #170
@konard konard self-assigned this Jun 10, 2026
konard added 2 commits June 11, 2026 00:23
…ands (#170)

The CI test 'should provide better error objects than bash' intermittently
reported exit code 143 / stderr 'Process killed with SIGTERM' instead of the
real exit code (Windows/Bun, run 27310950658). Two defects in the global
test-isolation reset were responsible:

1. cleanupActiveRunners() force-killed commands that user code was still
   awaiting, replacing the real exit code with a synthetic SIGTERM result.
   The reaper now skips awaited, not-yet-finished runners. A new _awaited
   flag is set synchronously when user code starts consuming a runner
   (await/then/catch/finally/stream).

2. monitorParentStreams() leaked a 'close' listener on process.stdout/stderr
   on every ProcessRunner construction (resetGlobalState cleared the flag but
   not the listeners), eventually triggering a MaxListenersExceeded warning.
   Listeners are now tracked and removed on reset.

Adds js/tests/issue-170-cleanup-race.test.mjs reproducing both defects
(3 fail on clean code, 3 pass with the fix) and experiments/repro-issue-170-awaited.mjs.
…ison

Captures the failing run 27310950658 log, the JS pipeline-template workflows
used for comparison, and a full analysis: timeline, requirements, root-cause
of both defects (reaper race + parent-stream listener leak), the fix, the
workflow comparison (no matching defect in templates), and verification.
@konard konard changed the title [WIP] Check CI/CD for all false positives and errors and fix it all fix(js): preserve real exit code when test reset races in-flight commands (#170) Jun 11, 2026
@konard konard marked this pull request as ready for review June 11, 2026 00:30
konard added 2 commits June 11, 2026 00:43
The real CI trigger (run 27310950658, Windows/Bun) was a spurious 'close'
event on the parent stdout WriteStream while an errexit command was in
flight and being awaited. monitorParentStreams' listener fired
_handleParentStreamClosure(), which killed the live child and replaced
the real exit code (5) with a synthetic SIGTERM (143), failing the
'should provide better error objects than bash' test.

The earlier reaper/listener-leak fixes were necessary but insufficient:
the kill came from the parent-closure handler, not cleanupActiveRunners.

Guard _handleParentStreamClosure() on this._awaited so graceful shutdown
on parent-stream closure only applies to fire-and-forget/streamed runners
whose output consumer went away — never to a command the user is awaiting,
where the await is the authoritative consumer.

Adds a regression test (verified: 143 without the guard, 5 with it) and a
standalone repro under experiments/.
Update the case study and changeset to reflect that the reaper-only fix
(9fd7ad2) was insufficient: CI still failed because the parent-stream
closure handler (_handleParentStreamClosure) was the path that preempted
the awaited command on Windows/Bun. Documents both teardown paths (reaper
+ parent-closure), the MaxListeners warning as the smoking gun for the
spurious WriteStream close, and the unified _awaited-based fix.
@konard konard added the parity-exempt Change is intentionally single-language; skip JS/Rust parity check label Jun 11, 2026
The Rust implementation's reset() only clears the active-runner set and
has no parent-stream-closure/synthetic-SIGTERM machinery, so none of the
three JS defects has a Rust counterpart. The PR is labeled parity-exempt
per the gate's documented escape hatch for single-language changes.
@konard konard changed the title fix(js): preserve real exit code when test reset races in-flight commands (#170) fix(js): preserve real exit code when test reset/parent-stream closure races in-flight commands (#170) Jun 11, 2026
@konard

konard commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Working session summary

All checks are green or appropriately skipped. The work for issue #170 is complete.

Summary

The CI failure in issue #170 (run 27310950658, Test JavaScript (bun on windows-latest)) was a global teardown preempting a command that user code was actively awaiting, replacing its real exit code (5) with a SIGTERM (143). I found and fixed three defects, all keyed on a new _awaited flag set synchronously when user code consumes a runner:

  1. Parent-stream-closure handler (the actual CI trigger). A spurious 'close' on the Windows/Bun parent WriteStream fired _handleParentStreamClosure(), killing the awaited in-flight command. Critically, my first fix (commit 9fd7ad2, reaper-only) did not stop the failure — CI re-ran and failed identically, which is how I traced the real path. The guard now ignores parent-stream closure for awaited runners while preserving graceful shutdown for fire-and-forget/streamed ones.
  2. Reaper racecleanupActiveRunners() skips awaited, unfinished runners.
  3. Listener leak — parent-stream 'close' listeners are now tracked and removed on reset (eliminating the MaxListenersExceeded warning, which was the smoking gun for defect $fy tool (sh to mjs translator) #1).

Verification:

  • Test JavaScript (bun on windows-latest) — the originally failing leg — now passes on the branch (and all other legs pass).
  • Regression test js/tests/issue-170-cleanup-race.test.mjs (4 pass); the parent-closure case verified to fail with 143 when the guard is removed.
  • Two standalone repros under experiments/ (Path A reaper, Path B parent-closure), both clean→143 / fixed→5 on node and bun.
  • Lint + Prettier clean; main is merged into the branch.

Other requirements:

  • R2/R5 (templates & external repos): the defect lives in command-stream's own library source; templates have no ProcessRunner/monitorParentStreams counterpart, and the Rust implementation's reset() only clears its runner set (no kill, no parent-closure monitoring) — so there's nothing to mirror or report upstream. The PR carries the documented parity-exempt label.
  • R3: deep case study in docs/case-studies/issue-170/README.md (timeline, both failed runs, three root causes, fix rationale, rejected alternatives, parity finding).
  • R7: all work is in the existing PR #171, title/body updated, changeset included.

PR: #171


This summary was automatically extracted from the AI working session output.

@konard

konard commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost: $16.379092

📊 Context and tokens usage:

Claude Opus 4.8: (4 sub-sessions)

  1. 113.3K / 1M (11%) input tokens, 45.2K / 128K (35%) output tokens
  2. 115.7K / 1M (12%) input tokens, 48.2K / 128K (38%) output tokens
  3. 115.4K / 1M (12%) input tokens, 38.6K / 128K (30%) output tokens
  4. 91.5K / 1M (9%) input tokens, 30.2K / 128K (24%) output tokens

Total: (32.8K new + 455.9K cache writes + 17.6M cache reads) input tokens, 182.5K output tokens, $16.379092 cost

🤖 Models used:

  • Tool: Anthropic Claude Code
  • Requested: opus
  • Model: Claude Opus 4.8 (claude-opus-4-8)

📎 Log file uploaded as Gist (6872KB)


Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit d846c98 into main Jun 11, 2026
12 checks passed
@konard

konard commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

🎉 Auto-merged

This pull request has been automatically merged by hive-mind.

  • All CI checks have passed

Auto-merged by hive-mind with --auto-merge flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parity-exempt Change is intentionally single-language; skip JS/Rust parity check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Check CI/CD for all false positives and errors and fix it all

1 participant